Average Localised Proximity: A new data descriptor with good default one-class classification performance

نویسندگان

چکیده

One-class classification is a challenging subfield of machine learning in which so-called data descriptors are used to predict membership class based solely on positive examples that class, and no counter-examples. A number have been shown perform well previous studies one-class classification, like the Support Vector Machine (SVM), require setting one or more hyperparameters. There has systematic attempt date determine optimal default values for these hyperparameters, limits their ease use, especially comparison with hyperparameter-free proposals Isolation Forest (IF). We address this issue by determining hyperparameter across collection 246 problems derived from 50 different real-world datasets. In addition, we propose new descriptor, Average Localised Proximity (ALP) certain issues existing approaches nearest neighbour distances. Finally, evaluate performance using leave-one-dataset-out procedure, find strong evidence ALP outperforms IF other descriptors, as weak it SVM, making good choice.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning without Default: A Study of One-Class Classification and the Low-Default Portfolio Problem

This paper asks at what level of class imbalance one-class classifiers outperform two-class classifiers in credit scoring problems in which class imbalance, referred to as the low-default portfolio problem, is a serious issue. The question is answered by comparing the performance of a variety of one-class and two-class classifiers on a selection of credit scoring datasets as the class imbalance...

متن کامل

Incremental One-Class Models for Data Classification

In this paper we outline a PhD research plan. This research contributes to the field of one-class incremental learning and classification in case of non-stationary environments. The goal of this PhD is to define a new classification framework able to deal with very small learning dataset at the beginning of the process and with abilities to adjust itself according to the variability of the inco...

متن کامل

Discriminating Against New Classes: One-class versus Multi-class Classification

Many applications require the ability to identify data that is anomalous with respect to a target group of observations, in the sense of belonging to a new, previously unseen ‘attacker’ class. One possible approach to this kind of verification problem is one-class classification, learning a description of the target class concerned based solely on data from this class. However, if known non-tar...

متن کامل

Classification on Proximity Data with LP–Machines

We provide a new linear program to deal with classification of data in the case of functions written in terms of pairwise proximities. This allows to avoid the problems inherent in using feature spaces with indefinite metric in Support Vector Machines, since the notion of a margin is purely needed in input space where the classification actually occurs. Moreover in our approach we can enforce s...

متن کامل

Classification of Asymmetric Proximity Data

When clustering asymmetric proximity data, only the average amounts are often considered by assuming that the asymmetry is due to noise. But when the asymmetry is structural, as typically may happen for exchange flows, migration data or confusion data, this may strongly affect the search for the groups because the directions of the exchanges are ignored and not integrated in the clustering proc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Pattern Recognition

سال: 2021

ISSN: ['1873-5142', '0031-3203']

DOI: https://doi.org/10.1016/j.patcog.2021.107991